Modern Time Series Forecasting with R
Packages & Datasets

Marco Zanotti

Packages

The course focuses on R toolkit organized by role. For data wrangling and visualization, we use the tidyverse to cover the essentials, while timetk streamlines time-aware feature engineering, visualization, and preprocessing.

We organize modeling work using tidymodels, which brings together workflows, parsnip, tune, dials, recipes, rsample, and yardstick for a consistent interface to model specification, tuning, resampling, preprocessing, and evaluation. For time series specifically, modeltime extends tidymodels with forecasting workflows, and its companions add backtesting (modeltime.resample), ensembling (modeltime.ensemble), and AutoML integration (modeltime.h2o).

For deep learning we bridge to Python with reticulate and access Amazon’s GluonTS algorithms through modeltime.gluonts, enabling state-of-the-art probabilistic forecasting within the same tidymodels workflow.

Time series fundation models are discusse and Nixtla’s TimeGPT is tested via the timegptr package. Finally, time series agents are explored using TimeCopilot.

All the required packages can be installed and loaded using the following code:

source("src/R/utils.R")

pkgs <- c(
  "devtools", "remotes",
  "tidyverse", 
  "timetk", 
  "forecast", "prophet", "smooth", "thief",
  "glmnet", "earth", "kernlab", "kknn",
  "randomForest", "ranger", "xgboost", "bonsai", "lightgbm",
  "Cubist", "rules",
  "tidymodels", "modeltime", "modeltime.ensemble",
  "parallel", "doFuture", "tictoc",
  "reticulate"
)
install_and_load(pkgs)

# Install CatBoost from source for Linux
devtools::install_url(
  "https://github.com/catboost/catboost/releases/download/v1.0.0/catboost-R-Linux-1.0.0.tgz",
  INSTALL_opts = c("--no-multiarch", "--no-test-load")
)

# Install packages from GitHub
remotes::install_github("business-science/modeltime.gluonts")
remotes::install_github("business-science/modeltime.h2o")

Datasets

Email Subscribers

A company decided to change the selling process of its products converting from a completely physical store approach, to a more digital and modern solution. Hence, it decided to open an online web store that integrates an e-commerce platform, where its “virtual” customers can by all the merchandise.
In order to monitor this new business solution, it adopted few well-known data analytics tools.

Google Analytics has been set up on the web store pages to collect data related to page views, sessions and organic searches. This could potentially help the company to understand whether its website is gaining popularity.

Moreover, MailChimp is used to track all the customers that buy a product and subscribe to the web store.

Finally, marketing events like discount programs and new product launch are promoted through several social network channels.

All these data are stored into the company database and can be used to analyze the factors that impacts on the web store sales.

M4 Competition Hourly

The M4 Competition is a well-known time series forecasting competition organized by Spyros Makridakis. The competition provides a large dataset of time series from various domains, including finance, economics, and demographics. The goal of the competition is to develop accurate forecasting models for these time series.

https://www.unic.ac.cy/iff/research/forecasting/m-competitions/m4/

We will use a sample of the M4 Hourly dataset, which consists of hourly time series data. The dataset contains multiple time series, each identified by a unique ID.